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Features 

• Full 64-bit Alpha architecture: 

- Advanced RISC architecture 
Optimized for high perform- 
ance implementations 

- Multiprocessor support 

- IEEE single and double 
precision, VAX F_ floating 
and G_floating, longword 
and quadword data types 
Cycle counter for code 
optimization 

• Privileged Architecture Library 

Code (PALcode) supports: 

- Optimization for multiple 
operating systems 
Flexible memory manage- 
ment implementations 

- Multi-instruction atomic 
sequences 

• Ultra-high performance Alpha 

implementation: 

- Dual-pipelined architecture 
150 MHz cycle time 

- Peak instruction execution of 
300 million operations per 
second 

• On-chip write buffer with four 

32-byte entries 

• Selectable data bus width and 

speed: 

- 64 or 128 bit data width 
75 MHz to 18.75 MHz bus 
speed 


• On-chip pipelmed floating point 
unit 

• 8Ki>yte data rache 

• 8K byte instruction cache 

• External cache memory support: 

- On-chip external secondary 
cache control 

- Programmable cache size and 
speed 

• On-chip demand paged memory 
management unit: 

- 12 entry I-stream TB with 8 
entries for 8K byte pages and 
4 entries for 4M byte pages 

- 32 entry D-stream TB with 
each entry able to map 8K, 
64K, 512K, or 4M byte pages 

• On-chip parity arxl ECC genera- 
tors and checkers 

• Internal clock generator provides: 

- High-speed chip clock 

- Pair of programmable system 
clocks (CPU/2 to (2PU/8) 

• Programmable on-chip perform- 
ance counters measure CPU and 
system performance 

• Chip and module level test 
support 

• 3.3-volt supply voltage 

- Lower power 

- Higher reliability 

- Interface to 5-volt logic 


Description 

Digital’s DECchip 21064-AA microprocessor is the first in a family of chips to 
implement Digital’s Alpha architecture. The DECchip 21064-AA microproces- 
sor is a .75 micron CMOS based super-scalar super-pipelined processor using 
dual instruction issue and a 150 MHz cycle time. The Alpha architecture is a 64- 
bit RISC architecture designed with particular emphasis on speed, multiple in- 
struction issue, multiple processors, and software migration from VAX/VMS and 
MIPS/ULTRIX q>erating environments. 


DECchip 21064-AA 
MicroArchitecture 

The DECch^ 21064-AA microproc- 
essor consists of four independent 
functional units: the integer execu- 
tion unit (Ebox), floating point unit 
(Fbox), the load/store or address unit 
(Abox) and the branch unit. Other 
sections include the central control 
unit (Ibox) and the I and D cache. 

Ebox - Contains a 64-bit fully 
pipelined integer execution data path 
including: adder, logic box, barrel 
shifter, byte extract and mask, and 
independent integer multiplier. The 
Ebox also contains a 32-entry 64-bit 
integer register file. 

Fbox - Contains a fully pipelined 
floating point unit and indepeixlent 
divider, supporting both IEEE and 
VAX floating point data types. 


IEEE single precision and double 
precision floating point data types 
are supported. VAX F_floating 
and G_floating data types are fully 
-supported witlHimited support for 
the D_floating data type. 

Abox - Contains five major sections: 
address translation data path, load 
silo, write buffer, data cache 
(Dcache) interface, and the external 
bus interface unit (BIU). 

The Abox supports all integer and 
floating point load and store instruc- 
tions, including address calculation 
and translation, and cache control 
logic. 

Ibox - Performs instruction fetch, 
resource checks, and dual instruction 
issue to the Ebox, Abox, Fbox, or 
branch unit. In addition, the Ibox 
controls pipeline stalls, aborts and 
restarts. 
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Pipeline Organization 


The DECchip 21064-AA microproc- 
essor uses a seven stage pipeline for 
•iiitegerq)erate'tiDdmie^ 
ence instructions, and a ten stage 
pipeline for floating point operate 
instructions. The Ibox maintains 
state for all pipeline stages to track 
outstanding register writes. 

Cache Organization 

The DECchip 21064-AA microproc- 
essor contains two on-chip caches, 
data cache (Dcache) and instruction 
cache (Icache). The chip also sup- 
ports an external cache. 

Dcache - Contains 8K bytes and is a 
write through, direct m^ped, read- 
allocate physical cache with 32-byte 
blocks. 

Icache - Contains 8K bytes and is a 
physical direa-mapped cache with 
32-byte blocks. 


Characteristics 

Power Supply 

Operating Temperature (with proper 
heatsink aiKl airflow) 

Storage Temperature Range 
Power Dissipation @Vdd = 3.45V 
Speed = 6.6 ns 


External Cache - The DECchip 
21064-AA supports external cache 
built from off-the-shelf static 
RAMs. The DECchip 21064-AA di- 
•Tectly'COTtrols the RAMs using its 
programmable external cache inter- 
face, allowing each implementation 
to make its own external cache speed 
and configuration trade-offs. 

The external cache interface supports 
cache sizes from 0 to 8M bytes and 
a range of operating speeds which 
are sub-multiples of the chip clock. 

Virtual Address Space 

The virtual address is a 64-bit 
unsigned integer that specifies a byte 
location within the virtual address 
space. The DECchip 21064-AA mi- 
croprocessor checks aU 64-bits of a 
virtual address and implements a 43- 
bit subset of the address space. The 
DECchip 21064-AA supports a 
physical address space of 16G bytes. 


Vss 0.0 V, Vdd 3.3 V ±5% 

O^C to 70^C 

-55°Cto 125°C 

23 W typical, 27.5 W maximum 
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Alpha Architecture 
Summary 

- The. DE Cchip 1 1 064nAA-inicroprQC- 
essor implements the Alpha architec- 
ture. The Alpha architecture sup- 
ports: 

• A fixed 32-bit instruction size 

• Separate integer and floating 
point registers 

- 32 64-bit integer registers 

32 64-bit floating point 
registers 

• 32-bit (longword) aixl 64-bit 
(quadwoid) integer along with 
32-bit and 64-bit IEEE and VAX . 
floating-point data types 

• Memory access using a 64-bit 
virtual byte address 

• Privileged Architecture Library 
Code (PALcode) 

Instruction Set 

Instructions are all 32 bits in length 
using four different instruction for- 
mats specifying 0, L 2, or 3 5-bit 
register fields. Each format uses a 6- 
bit opcode. 


CALL.PAL 

Branch 

Memory 

Operate 


Branch Instructions - Conditional 
branch instructions test a register for 
positive/negative, zero/nonzero, or 
even/odd, and perform a PC relative 
branch. Unconditional branch 
instructions perform either a PC 
relative or absolute jump using an 
arbitrary 64-bit register value. They 
can update a destination register with 
a return address. 

Load/Store Instructions - can move 
either 32-bit or 64-bit quantities. 

8-bit and 16-bit load/store operations 
are supported through an extensive 
set of in-register byte manipulations. 

Integer Operate Instructions - 
manipulate full 64-bit values, and 
include a full complement of 
arithmetic, compare, logical, and 
shift instructions. In addition there 
are three 32-bit integer operates: add, 
subtract, and multiply. 

In addition to the operation of 
conventional RISC architectures, 
the Alpha architecture provides 
scaled add/subtraa for quick 
subscript calculation, 128-bit 
multiply for division by a constant 
and multi-precision arithmetic, 
conditional moves for avoiding 
branches, and an extensive set of 
in-register byte manipulation 
instructions. 


OP 

Number 

OP 

RA 

Disptaoement 

OP 

RA 

R8 

Dtsptecement 

OP 

RA 

RB 

Function RC 


C ALL_PAL Instructions - vector 
to a privileged library of software 
that atomically performs both 
privileged and unprivileged 
functions. 
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Floating-Point Operate Instruc- 
tions - include four complete sets of 
instructions for EEEE single, IEEE 
double, VAX refloating and VAX 
^ -G^floating-aritbmetic. Jxi>adduion-to 
arithmetic instructions there are also 
instructions for conversions between 
floating and integer values including 
the VAX D_floating data type. 

Privileged Architecture Library 
Code 

P ALcode is a privileged library of 
software that atomically performs 
such functions as the dispatching and 
servicing of interrupts, excepticMis, 
task switching, and additional 
privileged and unprivileged user 
instructions as specified by 
operating systems using the 
CALL_PAL instruction. 

P ALcode is the only method of 
performing some operations on the 
hardware. In addition to the entire 
instruction set, a set of implementa- 
tion specific instructions is provided. 

P ALcode runs in an environment 
with privileges enabled, instruction 
stream mapping disabled, and 
interrupts disabled. Disabling 
memory mapping allows PALcode to 
suf^rt functions such as TB miss 
routines. Disabling interrupts allows 
the instruction stream to provide 
multi-instiuctioo sequences as 
atomic operations. 


Memory Management 

The Alpha memory management 

architecture is designed to provide: 

• A large address space 
for instructions and data 

• Convenient and efficient sharing 
of instructions and data. 

• Independent read and write ac- 
cess protection 

• Flexibility through programma- 
ble PALcode support 
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Alpha Architecture Compared to Conventional RISC Architecture 

The Alpha architecture is different from conventional RISC architectures in a number of ways: 


Feature 

64-Bit Architecture 
High Speed 

Multiprocessor Support 


Multiple Operating 
Systems 

Byte Manipulation 
Arithmetic Traps 
HINTS 


Difference 

True 64-bit architecture with 64-bit data and address. Not a 32-bit architecture that was 
later expanded to 64 bits. 

The Alpha architecture was designed to allow very high-speed implementations. Simple 
instructions make it particulariy easy to build implementations that issue multiple 
instructions every CPU cycle. There are no implementation specific pipeline timing 
hazards, no load delay slots, and no branch delay slots. 

The Alpha architecture does not enforce strict read/write ordering between multiple proc- 
essors. This allows multiprocessor implementations to easily use features such as: multi- 
bank caches, bypassed write buffers, write merging, and pipelined writes with retry on 
error. To maintain strict ordering between accesses as seen by a second processor, 
memory barrier instructions can be explicitly inserted in the program. The basic multi- 
processor interlocking primitive is a RISC style loadjocked, modify, store_conditional 
sequence. If the sequence runs without interrupt, exception, or an interfering write from 
another processor, the store succeeds. Otherwise, the store fails and the program 
eventually must branch back and retry the sequence. 

The Alpha architecture provides flexibility by allowing the user to implement a privileged 
library of software for operating system specific operations. This allows Alpha to run full 
VMS using one version of this software library that mirrors many of the VAX operating 
system features, and to run OSF/1 using a different version that mirrors many of the 
MIPS operating system features. Additional operating system implementations can be 
efficiently supported. 

The Alpha architecture is unconventional in the approach to byte manipulation. Byte 
loads, stores, and operations are done with normal 64-bit instructions, crafted to keep the 
sequences short. Single-byte stores found in conventional RISC architectures force cache 
and memory implementations to include hardware byte operations and implement read- 
modify-write cycles which can complicate system design and reduce performance. 

In contrast to conventional RISC architectures, the reporting of Alpha architecture arith- 
metic traps (overflow, underflow, and others) are imprecise. This removes architectural 
bottlenecks that affect performance. If precise arithmetic exceptions are desired, trap bar- 
rier instructions can be explicitly inserted in the program to force traps to be dehvered at 
specific points. 

Alpha architecture includes a number of implementation-specific HINTS aimed at allow- 
ing higher performance. Software is able toprovide HINTS to the hardware that enable 
the hardware to optimize its operation. HINTS can help improve the utilization of the 
pipeline, cache memory, and translation lookaside buffers. 
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Signals 


Name 


adr__h33:5 
data_h 127:0 
check_h 27:0 
dOEJ 

dWSeLh 1:0 

dRAck_h 2:0 

tagCEOE_h 

tagCtlWE.h 

tagCtlV^h 

tagCtlS_h 

tagCtlD_h 

tagCtlP_h 

tagAdr_h 33:17 

tagAdrP_h 

tagOK_hJ 

tagEqJ 

dataCEOE_h 3:0 

dataWE.h 3:0 

dataA_h 4:3 

holdReq_h 

holdAck_h 

cReq_h 2:0 

cWMask_h 7:0 

cAck_h 2:0 

iAdr_h 12:5 

dInvReq_h 

dMapWE_h 

irq_h 5:0 

sRomOEJ 

sRomD_h 

sRomclk_h 

vRef 

eclOut_h 

perf_cnt_h 1:0 

threestatej 

icMode_h 1:0 

contj 

clkln_h,_l 

testClkIn_h,_l 

q?uClkOut_h 

sysClkOutl_h,_l 

sysClkOut2_h,_l 

dcOk_h 

resetj 


Type Function 


Input/Output 

Address bus 

Input/Output 

Data bus 

Input/OuQ)ut 

Check bit bus 

Input 

Data bus output enable 

Input 

Data bus write data select 

Input 

Data bus data acknowledge 

Output 

External cache RAM tagCtl, tagAdr CE/OE 

Output 

External cache RAM tagCtl WE 

Input/Output 

Tag valid 

Input/Output 

Tag shared 

Input/Output 

Tag diity 

Input/Output 

Tag V/S/D parity 

Input 

Tag address 

Input 

Tag address parity 

Input 

Tag access from CPU is ok 

Output 

Tag compare ouq>ut 

Output 

External cache RAM data CE/OE, longword 

Output 

External cache RAM data WE, longword 

Output 

External cache RAM data A 4:3 

Input 

Hold request 

Output 

Hold acknowledge 

Output 

Cycle request 

Output 

Cycle write mask 

Input 

Cycle acknowledge 

Input 

Invalidate address, Dcache 

Input 

Invalidate request, Dcache 

Output 

External Dcache duplicate tag RAM WE 

Input 

Interrupt request 

Output 

Serial ROM output enable 

Input 

Serial ROM data/Rx data 

Output 

Serial ROM clock/Tx data 

Input 

Input reference 

Input 

Output mode selection 

Input 

Performance counter inputs 

Input 

Three state for testing 

Input 

Icache Test Mode Selection 

Input 

Continuity for testing 

Input 

Qock input 

Input 

Qock input for testing 

Output 

CPU clock output 

Output 

System clock output, normal 

Output 

System clock output, delayed 

Input 

Power and Clocks ok 

Input 

Reset 
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Information 


For more infonnatioii on Digital’s 
DECchip 21064-AA Microprocessor 
c^: 

1-800-DEC-2717 

1-800-DEC-2515 TTY 

Orders may be placed through 
Digital’s Technical OEM (TOEM) 
Sales Representatives. Call your 
local Digital Sales Office for details. 
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